-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core): update document in vector store #210
Conversation
… with same source.id
…cuments with the same source
Trivy scanning results. .venv/lib/python3.10/site-packages/PyJWT-2.9.0.dist-info/METADATA (secrets)Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0) MEDIUM: JWT (jwt-token) .venv/lib/python3.10/site-packages/litellm/llms/huggingface_llms_metadata/hf_text_generation_models.txt (secrets)Total: 1 (MEDIUM: 0, HIGH: 0, CRITICAL: 1) CRITICAL: HuggingFace (hugging-face-access-token) .venv/lib/python3.10/site-packages/litellm/proxy/_types.py (secrets)Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0) MEDIUM: Slack (slack-web-hook) |
packages/ragbits-document-search/src/ragbits/document_search/_main.py
Outdated
Show resolved
Hide resolved
Code Coverage Summary
Diff against main
Results for commit: e84f110 Minimum allowed coverage is ♻️ This comment has been updated with latest results |
…ries_with_same_sources
…dling_document_ingestion_with_different_content_and_verifying_replacement
packages/ragbits-document-search/src/ragbits/document_search/_main.py
Outdated
Show resolved
Hide resolved
packages/ragbits-document-search/src/ragbits/document_search/_main.py
Outdated
Show resolved
Hide resolved
packages/ragbits-core/tests/integration/vector_stores/test_chroma.py
Outdated
Show resolved
Hide resolved
packages/ragbits-document-search/src/ragbits/document_search/_main.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Description
This PR ensures that multiple ingestions of the same documents (in terms of
source.id
) do not retainVectorStoreEntries
related to previous versions of the documents in the database. Before each ingestion, thesource.id
of the documents to be ingested is checked, and allVectorStoreEntries
with the samesource.id
are removed from the database. The ingestion then proceeds as usual.Implemented logic to check if the document's id is already present in
VectorStore
.Implemented the
remove()
method forChromaVectorStore
,QdrantVectorStore
, andInMemoryVectorStore
.Implemented 3 unit tests for each
VectorStore
to verify the functionality of theremove()
methods.Implemented 3 integration tests for each
VectorStore
with the following steps:Testing
Testing can be done by running any of the integration tests:
packages/ragbits-core/tests/integration/vector_stores/test_chroma.py
packages/ragbits-core/tests/integration/vector_stores/test_in_memory.py
packages/ragbits-core/tests/integration/vector_stores/test_qdrant.py
or unit tests named in the below files:
packages/ragbits-core/tests/unit/vector_stores/test_chroma.py
packages/ragbits-core/tests/unit/vector_stores/test_in_memory.py
packages/ragbits-core/tests/unit/vector_stores/test_qdrant.py